Let's explore datasets

Explore input dataset

Will use target dataset Bitcoin in Cryptocurrency Historical Prices

Bitcoin data at 1-day intervals from April 28, 2013

Will explore full input dataset

Feature evalution over time

Will take only last 4 years, because they mostly interesting

Check depenence of trading and price from date in year and time of day

Firstly define function for display frequiency

Frequency of price

Frequency of transaction volume

Compare train and test datasets

Training data exploration

Testing data exploration

Normalise data

dataset is not stationary.

This means that there is a structure in the data that is dependent on the time. Specifically, there is an increasing trend in the data.

Stationary data is easier to model and will very likely result in more skillful forecasts.

A standard way to remove a trend is by differencing the data. That is the observation from the previous time step (t-1) is subtracted from the current observation (t). This removes the trend and we are left with a difference series, or the changes to the observations from one time step to the next.

The default activation function for LSTMs is the hyperbolic tangent (tanh), which outputs values between -1 and 1. This is the preferred range for the time series data.

To make the experiment fair, the scaling coefficients (min and max) values must be calculated on the training dataset and applied to scale the test dataset and any forecasts. This is to avoid contaminating the experiment with knowledge from the test dataset, which might give the model a small edge.

We can transform the dataset to the range [-1, 1] using the MinMaxScaler class.

Check window generator

Try baseline model

Try plot model

Explore training metrics